Skip to content

generate kernels#27

Open
BillJJ wants to merge 2 commits intomainfrom
bill/code_gen
Open

generate kernels#27
BillJJ wants to merge 2 commits intomainfrom
bill/code_gen

Conversation

@BillJJ
Copy link
Copy Markdown
Collaborator

@BillJJ BillJJ commented Mar 16, 2026

Just have binary ops for now. Will add more later. Should generate metal code that looks like this:

kernel void op_3 (
    device float* out [[ buffer(0) ]],
    const device float* in0 [[ buffer(1) ]],
    const device float* in1 [[ buffer(2) ]],
    uint gid [[ thread_position_in_grid ]])
{
    constant long shape[] = {2,3};
    constant long strides_in0[] = {3,1};
    constant long strides_in1[] = {3,1};
    uint remaining = gid;
    uint idx_in0 = 0;
    uint idx_in1 = 0;
    for (int i = 1; i >= 0; i--) {
      uint coord = remaining % shape[i];
      idx_in0 += coord * strides_in0[i];
      idx_in1 += coord * strides_in1[i];
      remaining /= shape[i];
    }
    out[gid] = in0[idx_in0] * in1[idx_in1];
  }

@kellen-sun
Copy link
Copy Markdown
Owner

apparently, constant long shape[] is fine, and shouldn't result in array creations at runtime, because the Apple Metal compiler will just write in the constants for us.
We should watch out if we're not constant though in the future

@BillJJ
Copy link
Copy Markdown
Collaborator Author

BillJJ commented Mar 16, 2026

you should comment in the code so that I can reply in thread. but yeah that's the idea. Metal should unroll and optimize. I think we can check the output if we want, but that should be hte case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants